智能论文笔记

Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

Paidamoyo Chapfuwa , Sherri Rose , Lawrence Carin , Edward Meeds , Ricardo Henao

分类： (统计)机器学习 | 机器学习

2022-02-25

使用黑框模型（例如神经普通微分方程（ODE））对动态系统的端到端学习为从数据中学习动力学的灵活框架提供了一个灵活的框架，而无需为动力学开出数学模型。不幸的是，这种灵活性是基于理解动态系统的成本，而该系统无处不在。此外，在各种条件（输入）（例如处理）或以某种方式分组（例如子人群的一部分）中收集了实验数据。了解这些系统输入对系统输出的影响对于具有动态系统的任何有意义的模型至关重要。为此，我们提出了一个结构化的潜在ode模型，该模型明确捕获了其潜在表示内的系统输入变化。在静态潜在变量规范的基础上，我们的模型学习了（独立的）随机因素，每个输入的系统输入的变异因素，从而将系统输入在潜在空间中的效果分开。该方法通过受控生成的时间序列数据提供了可行的建模，以实现新颖的输入组合（或扰动）。此外，我们提出了一种量化不确定性的灵活方法，利用分位数回归公式。在受到挑战的生物数据集上，在观测数据的受控生成和生物学上有意义的系统输入的推理中，对竞争基准的结果保持一致。

translated by 谷歌翻译

Efficient Classification of Very Large Images with Tiny Objects

Fanjie Kong , Ricardo Henao

分类：计算机视觉 | 机器学习

2021-06-04

当目标是将非常大的图像与微小的信息对象分类非常大的图像时，计算机愿景中的应用越来越多的计算机愿景中的应用程序越来越多地挑战。具体而言，这些分类任务面临两个关键挑战：$ i $）输入图像的大小通常按照MEGA或GIGA - 像素的顺序，然而，由于内存约束，现有的深层架构不容易操作在这种大图像上因此，我们寻求一种进程的记忆有效的方法来处理这些图像;和II $）只有非常小的输入图像的输入图像是信息的信息，导致对图像比率的低感兴趣区域（ROI）。然而，大多数当前的卷积神经网络（CNNS）被设计用于具有相对大的ROI和小图像尺寸（Sub-Peapixel）的图像分类数据集。现有方法孤立地解决了这两个挑战。我们介绍了一个端到端的CNN模型被称为缩放网络，利用分层注意采样，用于使用单个GPU分类大型物体。我们在四个大图像组织病理学，道路场和卫星成像数据集中评估我们的方法，以及一个简谓的病理学数据集。实验结果表明，我们的模型比现有方法达到更高的准确性，同时需要更少的内存资源。

translated by 谷歌翻译

Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer

Zidi Xiu , Junya Chen , Ricardo Henao , Benjamin Goldstein , Lawrence Carin , Chenyang Tao

分类：计算机视觉

2020-11-25

处理严重的级别不平衡对现实世界的应用构成了重大挑战，特别是当少数群体课程的准确分类和泛化是主要兴趣时。在计算机视觉中，从长尾数据集学习是一种重复主题，特别是对于自然图像数据集。虽然现有解决方案主要吸引采样或加权调整，以缓解病理不平衡，或强加归纳偏差，以优先考虑非杂散关联，以提高基于因果关系的不耐性原则的示例效率和模型泛化。我们的提议介绍了元分布式场景，其中数据生成机制在标签条件特征分布上不变。即使相应的特征分布表现出明显的差异，这种因果假设能够从主导类到所代价的对应物中的高效知识转移。这使我们能够利用因果关系程序来扩大少数阶级的代表性。我们的开发与现有的极端分类技术正交，因此可以无缝集成。我们提案的效用以广泛的综合性和现实世界计算机视觉任务验证，防止了SOTA解决方案。

translated by 谷歌翻译

Assessment of creditworthiness models privacy-preserving training with synthetic data

Ricardo Muñoz-Cancino , Cristián Bravo , Sebastián A. Ríos , Manuel Graña

分类：机器学习

2022-12-31

Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible to maintain borrowers' privacy and to address problems that until now have been hampered by the availability of information.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Transformer-based normative modelling for anomaly detection of early schizophrenia

Pedro F Da Costa , Jessica Dafflon , Sergio Leonardo Mendes , João Ricardo Sato , M. Jorge Cardoso , Robert Leech , Emily JH Jones , Walter H. L. Pinaya

分类：机器学习 | 人工智能

2022-12-08

Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.

translated by 谷歌翻译

D2DF2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation

Yuting Wang , Ricardo Guerrero , Vladimir Pavlovic

分类：计算机视觉

2022-12-02

Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and localization at inference time. To tackle this issue, we propose D2DF2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2DF2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.

translated by 谷歌翻译

Emerging trends in machine learning for computational fluid dynamics

Ricardo Vinuesa , Steve Brunton

分类：机器学习

2022-11-28

The renewed interest from the scientific community in machine learning (ML) is opening many new areas of research. Here we focus on how novel trends in ML are providing opportunities to improve the field of computational fluid dynamics (CFD). In particular, we discuss synergies between ML and CFD that have already shown benefits, and we also assess areas that are under development and may produce important benefits in the coming years. We believe that it is also important to emphasize a balanced perspective of cautious optimism for these emerging approaches

translated by 谷歌翻译

Direct deduction of chemical class from NMR spectra

Stefan Kuhn , Carlos Cobas , Agustin Barba , Simon Colreavy-Donnelly , Fabio Caraffini , Ricardo Moreira Borges

分类：人工智能 | 机器学习

2022-11-06

This paper presents a proof-of-concept method for classifying chemical compounds directly from NMR data without doing structure elucidation. This can help to reduce time in finding good structure candidates, as in most cases matching must be done by a human engineer, or at the very least a process for matching must be meaningfully interpreted by one. Therefore, for a long time automation in the area of NMR has been actively sought. The method identified as suitable for the classification is a convolutional neural network (CNN). Other methods, including clustering and image registration, have not been found suitable for the task in a comparative analysis. The result shows that deep learning can offer solutions to automation problems in cheminformatics.

translated by 谷歌翻译

Collaborative Anomaly Detection

Ke Bai , Aonan Zhang , Zhizhong Li , Ricardo Heano , Chong Wang , Lawrence Carin

分类：机器学习

2022-09-20

在推荐系统中，项目可能会接触到各种用户，我们想了解新用户对现有项目的熟悉。这可以作为异常检测（AD）问题进行配置，该问题区分“普通用户”（名义）和“新用户”（异常）。考虑到物品的庞大数量和用户项目配对数据的稀疏性，在每个项目上独立应用传统的单任务检测方法很快就变得困难，而项目之间的相关性则被忽略。为了解决这个多任务异常检测问题，我们建议协作异常检测（CAD）共同学习所有任务，并通过任务之间的嵌入编码相关性来学习所有任务。我们通过条件密度估计和条件可能性比估计来探索CAD。我们发现：$ i $）估计似然比的学习效率更高，并且比密度估计更好。 $ ii $）提前选择少量任务以学习任务嵌入模型，然后使用它来启动所有任务嵌入是有益的。因此，这些嵌入可以捕获任务之间的相关性并推广到新的相关任务。

translated by 谷歌翻译